Sparse Communication for Distributed Gradient Descent

نویسندگان

Alham Fikri Aji

Kenneth Heafield

چکیده

We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and apply them to neural machine translation and MNIST image classification tasks. Most configurations work on MNIST, whereas different configurations reduce convergence rate on the more complex translation task. Our experiments show that we can achieve up to 49% speed up on MNIST and 22% on NMT without damaging the final accuracy or BLEU.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We ...

متن کامل

Sparse Diffusion Steepest-Descent for One Bit Compressed Sensing in Wireless Sensor Networks

This letter proposes a sparse diffusion steepestdescent algorithm for one bit compressed sensing in wireless sensor networks. The approach exploits the diffusion strategy from distributed learning in the one bit compressed sensing framework. To estimate a common sparse vector cooperatively from only the sign of measurements, steepest-descent is used to minimize the suitable global and local con...

متن کامل

Preserving communication bandwidth with a gradient coding scheme

Large–scale machine learning involves the communicaiton of gradients, and large models often saturate the communication bandwidth to communicate gradients. I implement an existing scheme, quantized stochastic gradient descent (QSGD) to reduce the communication bandwidth. This requires a distributed architecture and we choose to implement a parameter server that uses the Message Passing Interfac...

متن کامل

Network Newton–Part II: Convergence Rate and Implementation

The use of network Newton methods for the decentralized optimization of a sum cost distributed through agents of a network is considered. Network Newton methods reinterpret distributed gradient descent as a penalty method, observe that the corresponding Hessian is sparse, and approximate the Newton step by truncating a Taylor expansion of the inverse Hessian. Truncating the series at K terms yi...

متن کامل

An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization

We propose a distributed first-order augmented Lagrangian (DFAL) algorithm to minimize the sum of composite convex functions, where each term in the sum is a private cost function belonging to a node, and only nodes connected by an edge can directly communicate with each other. This optimization model abstracts a number of applications in distributed sensing and machine learning. We show that a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Sparse Communication for Distributed Gradient Descent

نویسندگان

چکیده

منابع مشابه

Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

Sparse Diffusion Steepest-Descent for One Bit Compressed Sensing in Wireless Sensor Networks

Preserving communication bandwidth with a gradient coding scheme

Network Newton–Part II: Convergence Rate and Implementation

An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization

عنوان ژورنال:

اشتراک گذاری